An iterative approach to decision tree training for context dependent speech synthesis
نویسندگان
چکیده
In speech synthesis with sparse training data, phonetic decision trees are frequently used for balance between model complexity and available data. The traditional training procedure is that decision trees are constructed after parameters for each phones optimized in the EM algorithm. This paper proposes an iterative re-optimization algorithm in which the decision tree is re-learned after every iteration of the EM algorithm. The performance of the new procedure is compared with the original procedure by training parameters for MFCC and F0 features using an EDHMM model with data from The Boston University Radio Speech corpus. A convergence proof is presented, and experimental tests demonstrate that iterative re-optimization generates statistically significant test corpus log-likelihood improvements.
منابع مشابه
Context adaptive training with factorized decision trees for HMM-based speech synthesis
To achieve natural high quality synthesised speech in HMMbased speech synthesis, the effective modelling of complex acoustic and linguistic contexts is critical. Traditional approaches use context-dependent HMMs with decision tree based parameter clustering to model the full combination of contexts. However, weak contexts, such as word-level emphasis in neutral speech, are difficult to capture ...
متن کاملImproved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition
Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...
متن کاملA Context Clustering Technique for Average Voice Models
This paper describes a new context clustering technique for average voice model, which is a set of speaker independent speech synthesis units. In the technique, we first train speaker dependent models using multi-speaker speech database, and then construct a decision tree common to these speaker dependent models for context clustering. When a node of the decision tree is split, only the context...
متن کاملContext adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis
To achieve natural high quality synthesized speech in HMM-based speech synthesis, the effective modelling of complex acoustic and linguistic contexts is critical. Traditional approaches use context-dependent HMMs with decision tree based parameter clustering to model the full combinatorial of contexts. However, weak contexts, such as word-level emphasis in natural speech, are difficult to captu...
متن کاملUsing Bayesian Networks to find relevant context features for HMM-based speech synthesis
Speech units are highly context-dependent, so taking contextual features into account is essential for speech modelling. Context is employed in HMM-based Text-to-Speech speech synthesis systems via context-dependent phone models. A very wide context is taken into account, represented by a large set of contextual factors. However, most of these factors probably have no significant influence on t...
متن کامل